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Abstract: The Open Drug Discovery Teams (ODDT) project 
provides a mobile app primarily intended as a research 
topic aggregator of predominantly open science data col- 
lected from various sources on the internet. It exists to fa- 
cilitate interdisciplinary teamwork and to relieve the user 
from data overload, delivering access to information that is 
highly relevant and focused on their topic areas of interest. 
Research topics include areas of chemistry and adjacent 
molecule-oriented biomedical sciences, with an emphasis 
on those which are most amenable to open research at 
present. These include rare and neglected diseases, and 
precompetitive and public-good initiatives such as green 
chemistry. The ODDT project uses a free mobile app as user 
entry point. The app has a magazine-like interface, and 
server-side infrastructure for hosting chemistry-related data 
as well as value added services. The project is open to par- 



ticipation from anyone and provides the ability for users to 
make annotations and assertions, thereby contributing to 
the collective value of the data to the engaged community. 
IVluch of the content is derived from public sources, but 
the platform is also amenable to commercial data input. 
The technology could also be readily used in-house by or- 
ganizations as a research aggregator that could integrate 
internal and external science and discussion. The infrastruc- 
ture for the app is currently based upon the Twitter API as 
a useful proof of concept for a real time source of publicly 
generated content. This could be extended further by ac- 
cessing other APIs providing news and data feeds of rele- 
vance to a particular area of interest. As the project evolves, 
social networking features will be developed for organizing 
participants into teams, with various forms of communica- 
tion and content management possible. 
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1 Introduction 

Tools for drug discovery collaboration predominantly re- 
volve around desktop computer applications'^'^' though 
these authors believe that we will see a natural evolution 
toward the use of mobile apps in the drug discovery labo- 
ratory.'^' Recently tools for selective sharing of chemistry 
and biology data have used "software as a service" (SaaS) 
as a business model e.g. CDD and HEOS.'^'^"' These software 
platforms have seen their user bases grow, especially in the 
neglected disease communities for tuberculosis and malar- 
ia. They can also be useful for the secure sharing of data 
with collaborators in which retention of intellectual proper- 
ty (IP) is important. We are increasingly seeing a shift to 
more companies, institutes and researchers openly sharing 
data. This again has been predominantly in the neglected 
disease space (e.g. GSK, Novartis and St. Jude's sharing of 
malaria data).'^"'' Alongside this there are increasing efforts 
by researchers to publish in open access journals and re- 
lease data into open or free databases.'^""^^' In addition 
online tools are also being used to store science related 
content. Examples include FigShare,"^' SlideShare,'"*' Drop- 
box"^' etc. We should also not forget the efforts of various 
consortia to coordinate and make "big data" accessible. 
These include projects such as Elixir'^''''^' and Open 
PHACTS"^'^' and a myriad of other initiatives to free up 



data that was once the sole preserve of the pharmaceutical 
industry.'^""^^' How can we possibly connect all of this data 
and impact research? 

Social media platforms themselves can be useful for shar- 
ing data as evidenced by the success of blogs and wikis, 
both of which are commonly used as platforms for Open 
Notebook Science (ONS). These tools allow scientists to de- 
scribe their scientific methods and results in near real time, 
link to other content such as uploaded images, graphs and 
models, among others"" "' and do it all under open licens- 
es.'^'*' But how do you find these pioneering scientists that 



[a] S. Ekins 
Collaborations in Chemistry 

5616 Hilltop Needmore Road, Fuquay Varina, NC 27526, USA 
*e-mail: ekinssean@yahoo.com 

[b] A. M. Clark 

Molecular Materials Informatics 

1900 St. Jacques #302, Montreal, Quebec, Canada H3J 2S1 

[c] A. J. Williams 

Royal Society of Chemistry 
904 Tamaras Circle, Wake Forest, NC-27587, USA 
[if Re-use of this article is permitted in accordance with the Terms 
and Conditions set out at http://onlinelibrary.wiley.com/journal/ 
10. 1002/(ISSN) 1868-1 751/homepage/2022_onlineopen.html. 



Mol. Inf. 2012, 31, 585 - 597 



2012 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim 



®WILEYlfll 585 

ONLINE LIBRARY 



Full Paper 



offer potentially valuable content? We believe the new ef- 
forts to create scientific mobile apps may have a role to 
play in collaboration.'^"'^^' 

Another parallel concern is that of information overload 
in some subject areas: how are we expected to keep 
abreast of the avalanche of data flooding in from all sour- 
ces? If you are a non-scientist, how do you find information 
on a scientific topic (besides using online search engines 
and referring to Wikipedia), connect with researchers, or 
even hope to make a contribution in a scientific area if you 
are a citizen scientist, biohacker'^^' or patient advocate? The 
following describes how three separate sources of inspira- 
tion have led us to create a new mobile app called Open 
Drug Discovery Teams (ODDT) for the sharing of open sci- 
entific data and, more specifically, bringing these tools to 
bear on the topic of drug discovery. 

After one of us (SE) spoke with the parent of child with 
a rare disease, the parent shared that their experience post 
diagnosis suggested that they had to quickly become ex- 
perts on a rare disease as quickly as possible. This involved 
finding and reading scientific papers on the topic (many of 
which were available via subscription only), and connecting 
with researchers around the globe to find out more about 
the disease. Ultimately, like many parents in a similar situa- 
tion, this led to the foundation of their own non-profit to 
fund researchers, once they had found the scientists and 
clinicians that could perform the scientific studies necessary 
to help in their quest to begin to find a cure for their child. 
Additionally, we are seeing such parents also form compa- 
nies as an additional way to expedite translation of the sci- 
ence for their rare disease to the clinic. This represents an 
example of what we are seeing on a large scale as there 
are over 7000 rare diseases. At the other extreme are ne- 
glected diseases which kill millions of people every year 
(e.g. tuberculosis and malaria), and although there is fund- 
ing for research from foundations and governments, prog- 
ress is slow and not well coordinated, with groups re- 
screening large libraries of compounds in whole cell sys- 
tems (that have unbeknownst to them already been 
screened elsewhere) and only rarely using knowledge or 
computational tools that has been generated before or 
elsewhere. For these neglected diseases we are seeing 
a few open efforts to do drug discovery collaboratively for 
malaria and tuberculosis.'^^' Can we foster the same spirit in 
the area of very rare diseases? 

A second source of inspiration was the ScienceOnline 
2012 conference held in Jan 2012'^^' in which there were 
sessions on data overload and ONS. Various scientists de- 
scribed their strategies for dealing with the day-to-day chal- 
lenge of overwhelming email, tweets, and other social 
media and news alerts etc. and trying to manage them all. 
The ONS discussion illustrated that many scientists were 
using their blog as an open notebook and then tweeting 
links from it to their contacts. 

The third source of inspiration was the Pistoia Alliance.'^"' 
This is a pre-competitive initiative in which pharmaceutical 
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companies and software vendors are attempting to devel- 
op standards that can be used by all companies for their 
software systems and, where appropriate, developing soft- 
ware tools collaboratively rather than each pharmaceutical 
company developing its own informatics tools.'^" In Febru- 
ary 2012 they held a reception at the Royal Society of 
Chemistry with attendees from pharmaceutical companies, 
software companies, and venture capital funders. As an ice 
breaker they organized a Dragon's Den'^^' type competition 
in which they wanted to gather ideas of a new external 
"R&D Service" that would transform the Pharmaceutical 
R&D sector and that would be in production and fit-for-use 
in 2015. 

The intersection of all of these sources of inspiration en- 
couraged us to participate in the Dragon's Den. The follow- 
ing describes the ideation, software development, feedback 
and ideas for future development of ODDT. In keeping with 
the openness espoused in this article we shared our con- 
cepts and ideas in blogs including the incremental devel- 
opment and work in progress iterative reports associated 
with the technology development.'""^^' 



2 Methods 

2.1 Ideation 

Using an iPad with the Sketchbook IVlobile Express app (Au- 
todesk, Inc.'^*"') one of us (SE) sketched out several slides of 
ideas (Figure 1 A-E) which laid out the concept of one find- 
ing the best collaborators to do open research and then 
developing a tool to connect the open data and the re- 
searchers. Although these slides are perhaps simplistic they 
were a modern day "back of an envelope" sketch that con- 
nected the basic concepts from all the sources of inspira- 
tion. After sharing these slides with the co-authors it was 
decided to develop a prototype for the Dragon's Den com- 
petition that, initially, would harvest the Twitter feeds on 
a fixed number of specific research topics. 



2.2 Development 

One of us (AC) has developed a mobile chemistry aware 
platform (the Mobile Molecular Datasheet) which imple- 
ments powerful cheminformatics features. The core func- 
tionality is based on solving the problem of providing 
a high quality chemical structure editor for a palm-sized 
mobile device with highly constrained input functionality.'^^' 
This product has since matured and added extensive de- 
rived functionality, such as reaction editing, datasheet man- 
agement, web-services access and collaboration tools. The 
functionality has been packaged in library form, and used 
as the basis for a number of other apps with more task- 
specific functionality.'^^' 
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Figure 1. How the initial idea for ODDT was sketched out around core concepts. A) Connecting scientists. B) Cloud-based software to help 
collaborations. C) Integrating tools to find the right scientists. D) Integrating sources of social media, open and commercial data in a central 
hub. E) How the mobile interface could integrate the data and be perused. 



2.3 Client App 

The primary user interface is via the ODDT app, for iOS- 
based devices (iPhone, iPod and iPad). The app provides 



a user interface that is inspired by the "magazine-like" Flip- 
board app:'^^' the user initially selects from a list of topics, 
and from there can flip through recently posted content. 
The app is free for anyone to use, and provides content- 
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consumption features as its primary purpose. As it evolves, 
the app will also be used to participate in active content- 
sharing and social networking activities. 

2.4 Server 

The server-side architecture is based on the com.mmi soft- 
ware stack, used by the molsync.com service, which current- 
ly provides additional cheminformatics capabilities for vari- 
ous mobile apps.''*"' The extensions designed to host the 
ODDT project make use of Twitter as the primary source of 
content, which is regularly polled and assimilated into the 
data collection. The service provides an API for accessing 
ODDT topics and content. As the project evolves, the 
server will be gradually augmented to recognize particular 
data sources and information streams, and provide value 
added functionality. Currently it is able to recognize chemi- 
cal data such as molecular structures, reactions and data- 
sheets. 

2.5 Development Milestones to Date 

The ODDT project is designed so that it can be ramped up 
on a gradual curve and has provided useful functionality 
from the beginning. Because the content source is initially 
derived from Twitter, which consists entirely of public data, 
has a public API, and has a huge existing user base, the 
ODDT project can demonstrate initial utility by providing 
added value based on the existing data stream. 

We initially defined a small number of topics, corre- 
sponding to Twitter hashtags such as: #tuberculosis, #ma- 
laria, #hivaids, #huntingtons, #sanfilipposyndrome, #leish- 
maniasis, #chagas and #greenchemistry. We have built the 
server architecture for harvesting URLs from these tweets, 
and gathered them together into a collection that can be 
accessed via an API. We have performed some minimal rec- 
ognition of the links and, in particular, annotated those 
which lead to chemical data content, as well as enumerat- 
ing a list of embedded images within the corresponding 
web page. We have also hosted the server on molsync.com 
initially, with the intention of relocating at a later date. 

The client app, with its "Flipboard-like" interface, is at 
a minimum viable product stage: it provides browsing 
functionality for the topics defined by the ODDT project. 
The content for each topic roughly corresponds to individ- 
ual tweets with hashtags matching the predefined subject 
areas. Links that lead to chemical data (e.g. molecules, reac- 
tions, datasheets) are viewable using cheminformatics func- 
tionality embedded into the app, while actual editing is 
possible by opening the content using other apps installed 
on the device. 

Other links are currently treated as generic web pages: 
linked images are shown as thumbnails, and each page can 
be viewed with an inline web browser. This represents 
a first in class prototype mobile app for aggregating open 
data on scientific topics and diseases that are rare or ne- 
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glected in particular. The app is currently able to make use 
of the operator's Twitter account(s) to emit endorsements 
and comments for content items, which are subsequently 
harvested by the server in order to rank content and 
decide which documents should be retained in the system. 
This serves as a basic crowdsourced curation method, 
which will be expanded upon. During this prototype ver- 
sion the method for users to add content to ODDT is to 
use a standard Twitter client to emit links to relevant con- 
tent, and use one of the topic hashtags. The client app and 
server architecture do not, at this stage, identify team 
members or provide any direct ways to influence content, 
other than rating or commenting on it. An app from Molec- 
ular Materials Informatics, MolSync, has been enhanced to 
make it particularly straightforward for users to inject 
chemical data into any of the topic streams. 

2.6 Testing and Feedback 

We initiated an alpha version of the software providing it 
to a dozen or so researchers with an interest in testing it. 
This provided an opportunity to address usability issues, fix 
bugs, and observe use patterns for scientists with a diverse 
range of experience with mobile and social networking 
software. Following the conclusion of the alpha test, we 
submitted the beta version to the iTunes AppStore, with 
the intention of growing the number of test users by 
making it freely available to anyone with an iOS-based 
device. 



2.7 Support and Training 

The free app will not have any official technical support, 
though unofficially we are very interested in the users' ex- 
perience with the app, both positive and negative. Re- 
sponses from users regarding issues, bugs, and popular fea- 
tures will strongly influence the evolution of the product. 
As the community grows, the inherent communication fea- 
tures of the app will provide an intrinsic support group. 
The free app will ultimately have videos and documenta- 
tion online at the project website, oddt.net, and at scimobi- 
leapps.com.^'^^''^'^' 



3 Results 

Initially we used the app to harvest Twitter feeds on the 
hashtags described above for the diseases: malaria, tuber- 
culosis, Huntington's disease, HIV/AIDS, and Sanfilippo syn- 
drome as well as the research topic green chemistry,''*^' 
which is of interest because its community is highly recep- 
tive to open collaboration. We have since added #H5N1 
(bird flu) and #hh4gan (Giant Axonal Neuropathy). All of 
these subjects have high potential for positively impacting 
the research environment using computational approaches 
and dissemination information via mobile apps.'""'*^' We 
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have used Twitter to feed content into these topics, by pro- 
viding linl<s to molecules and links to structure-activity 
tables. This minimal app was described at the Pistoia Alli- 
ance Dragon's Den meeting along with a few of the initial 
concept sketches. Since then we have added the ability to 
endorse or reject tweets, expanded the number of topics 
by adding Chagas Disease and Leishmaniasis as well as 
a topic for information on ODDT In addition the ability to 
visualize a thumbnail image for each tweet was added, as 
well as recognition of linked images. The app can now be 
used to manage multiple twitter accounts for the user. We 
also added the ability to summarize the endorsement and 
rejection activity for the user, arranging the topics on the 
screen from most to least used (Figures 2-8). The entry 
screen to the app displays the topics ranked by use 
(Figure 2). Tapping a topic image opens the topic browser, 
starting with the incoming page. Newly added content is 
listed on the right (Figure 3). Each tweet that has been in- 
troduced into the data collection is referred to a factoid. A 
factoid can be endorsed or disapproved (Figure 4), and the 
hyperlinks can be followed. The recent page shows factoids 
with one or more vote (Figure 5) while the content section 
shows the most popular voted content in rank order 
(Figure 6). iVlolecule thumbnails can be tapped to open in 
other apps like MolSync (Figure 7A) or SAR Table (Fig. 7B). 
Back on the entry screen, tapping the statistics summary 
button opens a listing of endorsements, disapprovals, injec- 
tions and retirements for each hashtag (Figure 8). We have 
described this work at several meetings including the 
American Chemical Society'"'"' and will continue to promote 



its use in the scientific communities appropriate for each 
hashtag. From app idea to submission to the iTunes App- 
Store took a little over 2 months. 



4 Discussion 

We are witnessing the growing menace of both increasing 
cases of drug-sensitive and drug-resistant Mycobacterium 
tuberculosis strains and the challenge to produce the first 
new tuberculosis (TB) drug in well over 40 years. Tuberculo- 
sis kills 1.8 million people every year (~1 every 8 sec- 
onds).'''^' This is equivalent to malaria in terms of morbidity 
and it is believed that one third of the world's population 
is infected. Drug-drug interactions and co-morbidity with 
HIV are also important for TB. Recent years have seen in- 
creased investment from the Bill and Melinda Gates Foun- 
dation (BMGF) and the National Institutes of Health (NIH) 
but this has not (as yet) had a big impact on new clinical 
candidates and of those identified the majority of these are 
coming from pharmaceutical companies.'"^' The clinical TB 
pipeline is therefore thin and weak.'"^' Compounding this is 
the lack of data sharing on the scale of malaria where sig- 
nificant efforts have been made to assure the availability 
and access to data.'^"^' There are already open source ef- 
forts for drug discovery for tb'^°"^^' and malaria'^^' in India 
(OSDD) but this has no real international traction yet. In the 
USA there are disconnected efforts for collaborative TB 
drug discovery, including those of Collaborative Drug Dis- 
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Figure 2. Entry screen for ODDT. 9 topics are shown, which can be tapped to open content. The side bar has other informational icons. 
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Figure 3. Tuberculosis topic page listing incoming content. Factoids are shown with hyperlinks and a voting button. 
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Figure 4. Tuberculosis topic page listing incoming content showing voting options. 
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Figure 5. Tuberculosis topic page showing recent factoids with one or more vote. 
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Figure 6. Tuberculosis topic page showing content factoids sorted in order of most votes in favor. 



Mol. Inf. 2012, 31, 585 - 597 



© 2012 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim 



www.molinf.com 591 



Full Paper 

A 



S. Ekins et al. 



MolSync Sharing 



Download Datasheet 



Tuberculosis drugs 







DMa 




JM 

" J 


NuiMc ettiambutoJ 






Noma: Boniazid 


Ba«i*3 


0 


Name: pyrazinanude 




0 0 





B 










^^^^^ 



R3^ 

T 

R4 


12 0'^N«'X_/^ 
-LA, °~N 0-\ 


R-H 


R-H 


R-[- 


R-h 


R-H 




>d:7a 

MABA MIC: 2S.1 uM 
LORAMIC; 13.2 uM 
Vero IC50: > 128 uM 


R4^^|^*^Nr R1 
R5 


R— H: 


R— H: 


R— H 


R— H 


A- 

F 


F— 
F 


Id: 7b 

MABA MIC: 10.6 uM 
LORAMIC: 66.6 uM 
Vero IC50- > 128 uM 


R4'^?^N R1 
R5 


F 

-+ 

F 


FR- 


HR- 


HRH 


HiR-H 


F ^ 


id: 7c 

MABA MIC: 3.2 uM 
LORAMIC: 2S.1 uM 
Vero IC50: > 128 uM 


R 

T 

n 


N R1 

5 


F 


1-hIr 


F 

-h 

F 


r-hI 


R-H 




Id; 7d 

MABA MIC: 3.7 uM 
LORAMIC: 7.6 uM 
Vero IC50:> 128 uM 


"'Til ""^ " 

R4''^1*'^N R1 
R5 


F 


1-HR 


— H R- 


F 




0'''*^V**\__J^ MABAMIC:1.3uM 
^ X 0-*f 0-\ LORAMIC:nd 
/^!5;^^-^ u-IM \j \^ VofolC50:>ia8uM 


F 




F 


F 


-FR- 


4-i:r— 


■H R— 1- 


F ^ id: 71 
F— I— F n'''''N'tf=*\ # ;MABAMIC:3.auM 

Y 1 > — \ LORAMIC nd 
^Asj^Jts^ 0--N * ""^ 



Figure 7. Examples of links out from the ODDTapp. A) A set of tuberculosis drug structures from the MolSync app/™' B) A structure activi- 
ty table from the SAR Table app.'"' 
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Figure 8. Statistics summary, showing endorsements, disapprovals, injections and retirements. 



NIH. BMGF does not extensively fund European efforts 
while FP7 grants fund the More Medicines for Tuberculosis 
{MM4TB) project'"' as a collaborative pharma-academia col- 
laboration (but this is a closed initiative between those 
groups involved). The result is that there is significant TB 
data hoarding and scientists are not learning from the fail- 
ures of others resulting from these already well-funded ef- 
forts. We believe that there needs to be a change by pro- 
viding the neglected disease community with a way to 
make more data and content available on these diseases. 
This includes access to scientific papers or the key results 
and molecules. 

Traditionally, rare (orphan) diseases have very small pa- 
tient populations compared with neglected diseases, 
though there is no global agreement on what this size Is. 
In the US it is described as a disease that affects less than 
200000 people. Some estimates suggest this category in- 
cludes over 7000 rare diseases affecting 25-30 million 
people. Even though over 300 orphan drugs have 
been approved since the passage of the Orphan Drugs Act 
in 1983, there is still a long way to go until most rare dis- 
eases have a treatment.'^*"-^^' The recent publicity around 
the ULTRA act and TREAT act legislation suggests there is 
considerable interest to promote more research into rare 
diseases and translational research'^^'^^' and that this may 



lead to commercial opportunities. We believe the field of 
rare diseases needs increased collaboration and open shar- 
ing of data in order to facilitate discovery of new therapeu- 
tic approaches. 

At the same time we are seeing that drug discovery is 
shifting focus from the industry to outside partners and in 
the process creating new bottlenecks, suggesting the need 
for a more disruptive overhaul. With new mobile computer 
hardware and software applications, aspects of drug dis- 
covery will be done anywhere by potentially anyone, while 
emerging technologies could be used to reimagine the 
drug discovery process.'^"' 

Whether one mobile app like ODDTcan be used to revo- 
lutionize the way we perform neglected and rare disease 
research, and in turn the pharmaceutical industry, remains 
to be seen and we perhaps are quite early on in the pro- 
cess. We have however successfully put together a novel 
tool for open drug discovery collaboration that did not 
exist previously and leveraged social media. 

Stimulated by the Pistoia Alliance Dragon's Den experi- 
ence it is important to have a clear value proposition which 
can be summarized as follows. The project is intended to 
bring together freely accessible and open data in a single ag- 
gregated collection, and then facilitate the forming of open 
research teams around this data. Such teams consist of indi- 



Mol. Inf. 2012, 31, 585 - 597 



© 2012 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim 



www.molinf.com 593 



Full Paper 

vidual researchers in academia and industry, as well as other 
interested parties, including the general public, who have 
a desire to be kept up to date with research in a particular 
subject (disease) area. 

By assembling a focused group of disparate individuals 
and organizations scattered about the globe, the opportu- 
nity exists to disseminate important information to a highly 
relevant target audience. This includes information about 
new scientific developments of interest, to network and 
discover other researchers with complementary interests, 
and opportunities to collaborate in other ways such as 
highlighting conferences, publications or laboratory data 
sharing. 

As the software collection evolves and improves in its ca- 
pabilities, data from open science will be integrated and 
made usable within a common framework. Team members 
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will be able to borrow and reuse a growing collection of 
existing data. Networking and collaboration features will be 
introduced, allowing researchers to post incomplete or 
speculative data, while other researchers fill in missing de- 
tails, and the community as a whole can debate, contest or 
endorse data based on its quality. ODDT may also contrib- 
ute to the nascent development of "nanopublications".'^^' 
The app could also be used as a type of open lab notebook 
whereby individual researchers share links (URLs) to con- 
tent and the app aggregates these. 

We certainly envisage that we could capture data from 
other sources, and are actively considering population of 
the ODDT data collection with information from sources 
such as Google Alerts,"^'' Google Scholar,"*'' PubMed,'"" and 
any other service that provides an API and a preliminary 
method for categorizing scholarly works (Figure 9). As 
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ODDT expands we will require ontologies and data curation 
(automated and manual) as well as other steps to filter 
data and ensure relevancy for the topic. We are also consid- 
ering means by which new topics can be created by user 
consensus, though in the early stages of the project while 
the user community remains small, new topics will contin- 
ue to be added manually. Recently we have started a crowd- 
funding campaign using IndieGoGo""^' to assist in the inte- 
gration of additional data sources and the donor rewards 
are the selection of additional rare diseases. 

We may in future introduce a number of additional types 
of annotations that can be added to a factoid. Assuming 
that the factoid originated from a link to a core document, 
these annotations can be used to define commonly expect- 
ed subsections, in a somewhat structured format. Some of 
the annotation types we are considering are: 

- Description: A title-like summary of the content. 

- Synopsis: An abstract-like paragraph summary of the 
content. 

- Comment: A short comment discussing the content's 
relevance or lack thereof. 

- Review: A detailed analysis of the content, meant to 
evaluate it critically and inform readers of what can be 
learned from it, and any caveats that the reader should 
be wary of. 

- Subtopic: A classification within the topic, to encourage 
grouping. 

- A keyword for subsequent classification and recall. 

- Figure or image: A graphic, possibly with a caption. Typ- 
ically provided as a web standard image format, e.g. 
SVG, PNG. 

- Table: A data table of some form, loosely defined. A 
wide range of formats, often machine interpretable, e.g. 
delimited text, Excel, etc. 

- Cliemical data: A highly structured data bundle that 
represents some collection of chemical structures, reac- 
tions, auxiliary data, and other well defined content that 
is machine interpretable. 

These annotations can come from a variety of different 
users, and may be duplicates. When compiling optimally 
presentable results, it may be necessary to curate these by 
way of a priority system. 

While this system is nominally intended for annotation of 
content that is fully accessible via the original URL, it is po- 
tentially quite useful to annotate documents to which 
many users do not have access, e.g. articles that are behind 
a journal pay wall. The annotation of journal articles is al- 
ready possible using exemplar document readers such as 
Utopia documents. The collective annotations of the 
users who have access to the article may provide enough 
information to allow some users to bypass the purchasing 
of the article. It also has potential as a path to assemble ar- 
ticles from scratch, in a fully open medium, and institute 
a kind of informal peer review. In addition this functionality 
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goes further towards an open notebook use case scenario 
in which the scientist publishes their results whenever 
a suitable point is reached, whether or not the results are 
sufficiently complete to warrant dissemination in a tradition- 
al peer reviewed article. 

Users of ODDT can either be passive (lurkers, browse 
only) or active (crowdsourced addition and curation of con- 
tent). The cumulative curation efforts of active users are 
what will make the project successful, and so these activi- 
ties should be encouraged and rewarded as much as possi- 
ble. Because all of the information flows through Twitter, 
which is public, the users gain peer recognition via their 
Twitter accounts but it is also possible that the participa- 
tion could be measured using Alt-Metrics""^' approaches 
like Total Impact"^**' and Altmetric."*'' ODDT also tracks statis- 
tics on user activity associated with each Twitter account. 
By designing a set of activity milestones and assigning 
badges or trophies for activity overall, or a ranking system 
for individual topics, a kind of "compulsion loop" can be 
designed to keep users interested in continued participa- 
tion. 

Curated content from ODDT can be compiled from its 
raw sources, and combined with the various kinds of 
crowdsourced input, and periodically produced as an 
online "booklet or review". To a large extent it will be possi- 
ble to do this in an automated fashion, but some manual 
oversight and editing may be required. If sufficient produc- 
tion levels are able to be maintained, access to these 
"booklets" could be made available on a fee-based sub- 
scription basis. This fee ought to be waived for users who 
participated in curation at a specified badge/ trophy level 
over the period. 

Since the ODDT data source consists of incoming links to 
arbitrary data, some recognition of particular data types is 
valuable. From early versions, links from molsync.com™ 
and chemspider.com^"^ have been recognized, and their 
chemical data extracted even though the links refer to stan- 
dard HTML pages. A number of other open resources are 
available on the internet, and extracting the data in ma- 
chine readable form, as well as standard renderable web 
pages, increases the value of the ODDT project. New recog- 
nition types will be added regularly, and some explicit col- 
laborations are to be anticipated and we welcome addi- 
tional suggestions. 

The server for the ODDT project initially only stored mini- 
mal information - not much more than the relevant tweets 
that make up the data collection. The payload content is 
typically referred to by way of a link, which is passed 
through to the client and downloaded as necessary. In 
a later version, chemical data (structures, reactions, proper- 
ty and activity data) will be parsed by the server, and 
stored within the ODDT database. This growing chemical 
data warehouse will provide fast access to associated 
chemical content, and searching capabilities. There are 
a vast number of value added features that can be built on 
top of such data. The rate limiting step is ensuring that the 
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database has sufficient capacity, and has sufficiently high 
performance to handle a growing data warehouse, which 
can sustain complex queries. This may require the addition 
of extra servers, which would be expedited by securing 
a source of funding for the project. This is another of our 
aims with the crowdfunding through IndieGoGo. 

Later versions of the software could integrate with other 
cheminformatics and drug discovery related apps to be de- 
veloped (e.g. structure searching, activity data extraction, 
structure-activity series creation, automated model build- 
ing, docking against known targets, pharmacophore hy- 
pothesis generation etc.). One could imagine that the app 
will be able to function as a virtual research organization in 
which it becomes the primary tool for sourcing data, shar- 
ing data, models and hypotheses as well as selective collab- 
oration with other scientists in public through the open in- 
ternet, or in private behind a company firewall. ODDT could 
then be considered a foundation for crowdsourced drug 
discovery. 

With investment resources we intend to adapt the prod- 
uct for licensing to pharmaceutical research institutions for 
internal use and make it a closed app. Private documents 
can then be mixed with external "open" data to create a val- 
uable internal resource. We will leverage the ongoing de- 
velopment of software for analysis of documents and 
chemical data to provide informatics capabilities for con- 
tent discovery and extraction. These capabilities will com- 
plement then rapidly displace existing chemical database 
software as mobile and cloud computing becomes the ac- 
cepted paradigm for drug discovery. 

ODDT can already be used by scientists to make others 
aware of, for example, molecules that are described in a re- 
search paper that others may not be aware of because of 
accessibility issues with a paper. ODDT could also be used 
as a repository for preprints on a particular topic, thus 
making data and research more freely available. We have 
demonstrated the viability of the approach by tweeting in 
content ourselves and adding the #ODDT hashtag to auto- 
endorse it. The project is open to participation from 
anyone, and much of the content is derived from public 
sources, but is amenable to commercial data input (from 
publishers and pharmaceutical companies). In future ver- 
sions we imagine the user will need to be able to select 
their research topics of interest, corresponding hashtags 
and images for use in the app. 

To date we have performed this work without funding 
and made the app freely available. We would be happy for 
foundations, societies, individuals and companies to fund 
or sponsor research topics either directly or through our 
crowdfunding campaigns.""^' We hope that scientists will in- 
creasingly tweet links to papers, blogs, slides and molecules 
of relevance to scientists following the topic hashtags. 
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